Skip to content

Conversation

@michal-shalev
Copy link
Contributor

@michal-shalev michal-shalev commented Aug 24, 2025

What?

Add support for configurable CUDA architecture targeting using Meson's built-in cuda_args and cuda_link_args options.

Why?

Replaces hardcoded CUDA architecture flags with configurable defaults using meson's built-in CUDA argument handling for better flexibility.

How?

  • Leverage Meson's built-in cuda_args/cuda_link_args compiler options
  • Apply sensible defaults (compute_80, compute_90) only when user hasn't specified custom values
  • Updated README with comprehensive CUDA architecture configuration section
  • Simplified code by removing custom flag handling logic

Usage

# Uses defaults (Ampere & Hopper: compute_80, compute_90)
meson setup build

# Target specific architecture
meson setup build \
    -Dcuda_args="-gencode=arch=compute_75,code=sm_75" \
    -Dcuda_link_args="-gencode=arch=compute_75,code=sm_75"

# Target multiple architectures
meson setup build \
    -Dcuda_args="-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80" \
    -Dcuda_link_args="-gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80"

See README.md for complete documentation and additional examples.

@michal-shalev michal-shalev self-assigned this Aug 24, 2025
@michal-shalev michal-shalev requested a review from a team as a code owner August 24, 2025 08:23
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 24, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi michal-shalev! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

meson.build Outdated
nvcc_flags += ['-gencode', 'arch=compute_80,code=sm_80']
nvcc_flags += ['-gencode', 'arch=compute_90,code=sm_90']
nvcc_flags_link = []
if get_option('nvcc_gencode') != ''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need this check? wouldn't the for loop below be empty any way if options not set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because with a string option, ''.split(' ') returns [''], so the loop runs once and adds an empty flag. I changed nvcc_gencode to an array and removed this check and split to avoid empty args.

option('cudapath_inc', type: 'string', value: '', description: 'Include path for CUDA')
option('cudapath_lib', type: 'string', value: '', description: 'Library path for CUDA')
option('cudapath_stub', type: 'string', value: '', description: 'Extra Stub path for CUDA')
option('nvcc_gencode', type: 'string', value: '-gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_90,code=sm_90',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like can provide meson cuda build flags like this:

meson setup -Dcuda_args="-gencode=arch=compute_90,code=sm_90" \
            -Dcuda_link_args="-gencode=arch=compute_90,code=sm_90"

Can you pls check it? so maybe no need to add specific option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested -Dcuda_args and -Dcuda_link_args and it's working.
I've updated the PR to use meson's built-in cuda_args and cuda_link_args instead of adding a custom option.

ovidiusm
ovidiusm previously approved these changes Oct 6, 2025
'-gencode=arch=compute_80,code=sm_80',
'-gencode=arch=compute_90,code=sm_90'
]
default_cuda_link_args = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove and use default_cuda_args instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice that previously we had nvcc_flags and nvcc_flags_link, I did not won't to change that logic, only to use the built-in meson option

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean that default_cuda_link_args and default_cuda_args are the same, so we can remove the first one and use default_cuda_args line 114 too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvcc_flags and nvcc_flags_link were the same, I can set default_cuda_link_args to default_cuda_args

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am suggesting to remove default_cuda_link_args = {} at line 100, and replace all other occurences of default_cuda_link_args by default_cuda_args.

cuda_args = default_cuda_args
add_project_arguments(cuda_args, language: 'cuda')
else
cuda_args = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess we need to add_project_argument with cuda_args_option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, meson has built-in cuda_args and cuda_link_args options, but if it's not clear I can add another comment here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, but also why cuda_args=[]? i would have assumed we need to pass it to the cuda.compiles() below? or is it implictly passed already? in that case we could remove the args: cuda_args?

@michal-shalev
Copy link
Contributor Author

/build

- `compute_80,code=sm_80`: NVIDIA Ampere (A100, RTX 30xx)
- `compute_86,code=sm_86`: NVIDIA Ampere (RTX 30xx consumer)
- `compute_89,code=sm_89`: NVIDIA Ada Lovelace (RTX 40xx)
- `compute_90,code=sm_90`: NVIDIA Hopper (H100, H800, H200)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question 1: do we support Blackwell? Why is it not listed?
Question 2: what should be the value when packaging the wheel, since that needs to cover all platforms where the user may do pip install nixl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I didn't test it on a cluster with Blackwell, and I don't think we'll have time to test it for this release.
  2. IMO it should stay the same, there are still defaults

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants